Chapter 4 : Implementation and Validation of M - Statistic Real - Time Spatial Detection

نویسنده

  • Christopher A. Cassa
چکیده

Syndromic surveillance systems, especially software systems, have emerged as the leading outbreak detection mechanisms. Early outbreak detection systems can assist with medical and logistic decision support. One important concern for effectively testing these systems in practice is the scarcity of authentic outbreak health data. Because of this shortage, creating suitable geotemporal test clusters for surveillance algorithm validation is essential. Described is an automated tool that creates artificial patient clusters by varying a large variety of realistic outbreak parameters. The cluster creation tool is an open-source program that accepts a set of outbreak parameters and creates artificial geospatial patient data for a single cluster or a series of similar clusters. This helps automate the process of rigorous testing and validation of outbreak detection algorithms. Using the cluster generator, single patient clusters and series of patient clusters were created as files and series of files containing patient longitude and latitude coordinates. These clusters were then tested and validated using a publicly-available GIS visualization program. All generated clusters were properly created within the ranges that were entered as parameters at program execution. Sample semi-synthetic datasets from the cluster creation tool were then used to validate a popular spatial outbreak detection algorithm, the M-Statistic. Thesis Supervisor: Peter Szolovits Title: Professor of Computer Science and Engineering Research Supervisor: Kenneth Mandl Title: Research Director, Center for Biopreparedness at Children's Hospital Boston Bioterrorism Detection Cluster Creation Tool A B ST R A C T .................................................................................................................................................. 2 CHAPTER 1: INTRODUCTION TO SYNDROMIC SURVEILLANCE .......................................... 4 Evolution and Development of Syndromic Surveillance Systems....................................................... 5 G eocoding of p atient addresses ............................................................................................................ 8 Calculating Sensitivity and Specificity of Outbreak Detection Systems............................................ 8 D atasets for benchm arking perform ance........................................................................................... 9 CHAPTER 2: PARAMETERIZING AN OUTBREAK ..................................................................... 12 M ETRICS FOR DETECTION PERFORMANCE.............................................................................................. 17 GIS DATAPOINTS AND EARTH SURFACE MEASUREMENTS ................................................................... 20 U sing the H aversine F orm ula ............................................................................................................. 21 D istance M easurem ent M ethods................................................................................................... 21 BASICS OF DATE ALGORITHMS IMPLEMENTED IN CLUSTER GENERATOR............................................. 22 PARAMETERS FOR CLUSTER CREATION TOOL INPUT PARAMETERS: SINGLE CLUSTER............................ 26 PARAMETERS FOR CLUSTER CREATION TOOL ...................................................................................... 27 INPUT PARAM ETERS: SETS OF CLUSTERS ................................................................................................. 27 OUTPUT FILES FROM CLUSTER CREATION TOOL ................................................................................... 27 CLUSTER GENERATOR USER INTERFACE .............................................................................................. 28 RESULTS OF CLUSTER G ENERATION ......................................................................................................... 29 SAMPLE PROGRAM INPUT AND OUTPUT: ................................................................................................ 30 Example 1: Creating a Single Patient Cluster:................................................................................ 30 Example 1: CSV Text O utput Sample: ............................................................................................ 30 Example 1: Microsoft MapPoint GIS Mapping Output for Single Cluster:.................................... 31 Example 2: Creating a Series of Clusters (Varying Angle) ............................................................ 32 CREATING A SET OF OUTBREAK DATASETS FOR SPATIAL DETECTION ALGORITHM VALIDATION ........... 33 EVALUATING THE ACCURACY AND UNIFORMITY OF GENERATED CLUSTERS .......................................... 34 CHAPTER 4: IMPLEMENTATION AND VALIDATION OF M-STATISTIC REAL-TIME SPATIAL DETECTION............................................................................................................................ 35 D ETECTION STRA TEG IES .......................................................................................................................... 37 TEST CLUSTERS USED FOR M-STATISTIC VALIDATION ......................................................................... 38 M-STATISTIC DETECTION RATES BY STRATEGY .................................................................................. 39 CHAPTER 5: DISCUSSION: PRESENT RESULTS AND FUTURE DEVELOPMENT .............. 46 U SE OF THE CLUSTER CREATION TOOL .................................................................................................. 46 PERTINENT SYNDROMIC SURVEILLANCE FUTURE OBJECTIVES ................................................................ 46 M odeling the Super-Spreader Phenomenon .................................................................................... 46 N earest N eighbor M app ing ................................................................................................................. 47 Relating cluster density proportionally with population density .................................................... 47 C luster L ocation D istributions............................................................................................................ 48 LIM ITATIONS OF CLUSTER G ENERATOR ................................................................................................... 48 C O N C L U SIO N ............................................................................................................................................ 5 0 APPENDIX A: REAL-TIME M-STATISTIC DOCUMENTATION ................................................ 51 APPENDIX B: AVAILABILITY AND PROGRAM REQUIREMENTS............................................. 54 L IST O F A B B R EV IA T IO N S ........................................................................................................................... 54 A C KN O W LED G EM EN TS ............................................................................................................................. 54 R EFER EN C ES :.......................................................................................................................................... 55 Bioterrorism Detection Cluster Creation Tool 3 Chapter 1: Introduction to Syndromic Surveillance Concerns about newly emerging diseases and the possibility of bioterrorism attacks, coupled with increasing availability of real-time public health data, have led researchers to develop new suites of outbreak detection systems [1-2, 7-9]. Such systems can, in principle, give early warning of an outbreak, which can in turn help clinicians and public health officials make better medical and logistical decisions. To achieve greatest accuracy, such systems would rely on confirmed disease diagnoses. However, the need to make quick judgments and the huge potential benefits of recognizing an outbreak early in its course have favored the creation of systems that rely on recognition of common syndromes (e.g., "influenza-like illness") rather than waiting for definite diagnoses. Most systems under active study, therefore, are based on syndromic surveillance. The fundamental goal of syndromic surveillance systems is to be able to detect a small number of increased disease cases of one type of outbreak in a shorter time-frame than would be likely detected by astute physicians or medical administrators. [7] Several groups have quantified a range of detection goals, and some groups are attempting to discover small outbreaks (around fifteen extra visits or cases) within three days of the first abnormal visit to a clinic or emergency department. [11] Most syndromic surveillance systems to date focus on the number of cases of various syndromes detected and the clustering of these cases in time. Although these data are of obvious importance in outbreak detection, we also concentrate on another aspect of the data: spatial clustering. Indeed, it appears that the combined use of temporal and spatial clustering in surveillance data provides a better opportunity to recognize possible outbreaks rapidly. Bioterrorism Detection Cluster Creation Tool I Evaluation of syndromic surveillance systems is made difficult by the fact that, fortunately, actual outbreaks are rare. This thesis therefore addresses the challenging problem of creating realistic spatio-temporal syndromic data that can be used to understand the characteristics of surveillance systems. Evolution and Development of Syndromic Surveillance Systems Syndromic surveillance systems have evolved a great deal since their emergence in the late 1990s. There are numerous types and implementations of these real-time outbreak detection systems that rely on a multitude of data sources. [25] Data streams range from sources as basic as primary-care location visit records and emergency department admission records to over-the-counter medication sales, web-based medical system visits, public and private-sector health-hotline calls, and even orange juice sales. [14-16] These sources have been selected based on the premise that the earliest signs of a potential outbreak are aberrations in the expected numbers of visits or relevant purchase volumes at the time of outbreak analysis. In order for surveillance systems to be most useful, they should produce alerts in as timely a fashion as possible. Near real-time data acquisition and analysis is quickly becoming a standard practice. One example of a real-time syndromic surveillance system is the Automated Epidemiologic Geotemporal Integrated Surveillance (AEGIS) system [1], software that tracks patient emergency department visits using live data streams from northeastern region hospitals. Patient address data is automatically geocoded and chief complaints are encoded into syndromic categories shortly after the patient has been admitted to a participating emergency department. Bioterrorism Detection Cluster Creation Tool 5 Software systems to conduct syndromic surveillance often utilize large existing hospital or retail databases to establish baseline parameters and covariate values. Those baseline measurements are used to find expected visit ranges for a specific time frame, which are then compared against directly observed values in a recent time-frame. Significant deviations from expected values occur with specific likelihood values. If computed likelihood values are low enough (those values that would correspond to a high confidence in abnormality of patient distributions,) then outbreak flags are raised. These so-called red flags should cause medical informatics professionals to further analyze a potentially emerging situation. To increase the efficacy of surveillance systems, some organizations that track detailed patient data have chosen to group patients into syndrome categories. Several surveillance systems have found increases in sensitivity of outbreak detection and data analysis if the datasets being analyzed are grouped by syndrome. [2, 24] The national ESSENCE project [9] has developed a categorization system that utilizes a subset of ICD-9 (International Classification of Diseases, 9 th Revision) diagnostic codes for each syndrome that they use for syndrome grouping. A list of the ESSENCE syndrome groups is shown in Table 1. Another system, the Realtime Outbreak and Disease Surveillance (RODS) project [8], has created a free-text Complaint Coder (CoCo) which takes a patient's chief complaint and assigns it to a syndromic category using a Bayesian classification scheme. Both of these systems then conduct surveillance analysis on separate groups of patients with similar syndromes. Bioterrorism Detection Cluster Creation Tool 6 Table 1: Syndrome Groups and Diagnoses from the ESSENCE project. Source: http://www.geis.ha.osd.mil/GEIS/surveillanceactivities/ESSENCE/essenceinstructions.asp Syndrome Groups Representative Diagnoses Respiratory cough, pneumonia, upper respiratory infection Gastrointestinal vomiting, diarrhea Neurological meningitis, botulism-like symptoms Dermatologic Hemorrhagic (petechaie, bruising) Dermatologic infectious (vesicular rashes) Fever (unspecific fever, sepsis) Coma (coma, sudden death) There has also been a good deal of discussion regarding hardware implementations for real-time surveillance, but they appear to be more difficult than software solutions in the short term. Effective physical detection of threats is much harder because it requires a greater infrastructure and expense. Software detection systems also generally rely on information that is already encoded into computer systems while hardware detection systems generally derive their data from new sources. Data availability and variability issues create difficulty with temporal data analysis in a real-time fashion, but these systems have the potential to provide the sensitivity to detect a variety of public health outbreaks. [10] Temporal data filtering and smoothing is an expanding area of biosurveillance research which promise to improve many of the basic data variability concerns that interfere with optimal outbreak detection. A number of biosurveillance systems have also started to integrate some notion of the spatial clustering of patients in area neighborhoods (and aberrations from normal levels of clustering) into their outbreak detection techniques. Two prominent techniques in the field that provide values of aberration in spatial clustering from normal spatial distributions are the M-Statistic [13] (described in more detail later) and the Spatial and Bioterrorism Detection Cluster Creation Tool 7 Space-Time Scan Statistic, SaTScan. [11] The M-Statistic technique uses the deviation of the current distribution of inter-point distances (the distances between each patient and every other patient) from the distribution that would normally be expected, as a metric for spatial closeness. SaTScaN uses either a Poisson-based model, a Bernoulli model, or a space-time Permutation model, using user-provided patient visit data as the source of the underlying expected distribution. Geocoding of patient addresses All addresses are represented by latitude-longitude pairs in the cluster creation tool and in the implemented spatial detection algorithm. This standard was chosen because it is the most general and it avoids ambiguities that may arise from duplicate addresses or artificial boundaries such as zip codes. Patient addresses are geocoded into this format and then patients are categorized by their chief complaints (which are almost always recorded and coded by medical institutions.) [20-23] This allows doctors or medical practitioners to determine whether there is a correlation between patient location, time of visit, and symptoms. If this information is gathered for all patients at all times, it is easier to detect an outbreak when it occurs, because the outbreak data can be compared against expected baseline data from similar time periods. [25] Calculating Sensitivity and Specificity of Outbreak Detection Systems The quality of a detection system can only be assessed by observing its behavior on test cases of interest. To do so, we need to find or create valid test clusters of patients with disease and then measure how well our system is able to detect those patients. Bioterrorism Detection Ciuster Creation Tool 8 Detection efficacy is usually measured in terms of the sensitivity and specificity of the

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تعیین کمی بار ویروسی هپاتیت C با استفاده از روش Real-Time PCR In-House در بیماران آلوده به هپاتیت C در شهرستان خرم آباد

Background : Molecular diagnostic methods are among major tools in management of hepatitis C virus (HCV) in infected patients. Many studies have shown that viral load is associated with stage of infection and response to treatment. Therefore, the evaluation and quantification of viral load is very important. The goal of this study is implementation of inexpensive, yet accurate method for quanti...

متن کامل

ساختار دهی آنی داده‌‌های مکانی ورودی GIS با تأکید بر عارضه راه

An important issue in implementation of a GIS system is preparation of data to be entered in GIS. To produce spatial data for GIS using photogrammetric techniques, conventional method is to apply photogrammetric and GIS systems individually (off-line procedure). This approach is costly, time consuming and somehow unreliable due to the fact that 3D photogrammetric model is not available at the ...

متن کامل

Accurate Fruits Fault Detection in Agricultural Goods using an Efficient Algorithm

The main purpose of this paper was to introduce an efficient algorithm for fault identification in fruits images. First, input image was de-noised using the combination of Block Matching and 3D filtering (BM3D) and Principle Component Analysis (PCA) model. Afterward, in order to reduce the size of images and increase the execution speed, refined Discrete Cosine Transform (DCT) algorithm was uti...

متن کامل

Zoning Electrical Conductivity and Acidity of Groundwater through Using Geo-statistical Method: A Case Study in Semirom Plain, Esfahan Province

The groundwater quality research is one of the important and its pollution control was included insome research literatures. Ground water quality has spatial and temporal variation so classical statisticscould not account these variations at the regional scale researches. This study usedgeo-statisticalmethodsto optimize an interpolation method in order to estimate the spatial distribution of pH...

متن کامل

Real-Time intrusion detection alert correlation and attack scenario extraction based on the prerequisite consequence approach

Alert correlation systems attempt to discover the relations among alerts produced by one or more intrusion detection systems to determine the attack scenarios and their main motivations. In this paper a new IDS alert correlation method is proposed that can be used to detect attack scenarios in real-time. The proposed method is based on a causal approach due to the strength of causal methods in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014